note

  • Didn't mean to generalize NN here. Just plow through this 400>25>10 setup to get the feeling of NN

In [1]:
%reload_ext autoreload
%autoreload 2

import sys
sys.path.append('..')

from helper import nn
from helper import logistic_regression as lr
import numpy as np

prepare data


In [2]:
X_raw, y_raw = nn.load_data('ex4data1.mat', transpose=False)
X = np.insert(X_raw, 0, np.ones(X_raw.shape[0]), axis=1)
X.shape


Out[2]:
(5000, 401)


In [3]:
y_raw


Out[3]:
array([10, 10, 10, ...,  9,  9,  9], dtype=uint8)

In [4]:
y = nn.expand_y(y_raw)
y


Out[4]:
array([[ 0.,  0.,  0., ...,  0.,  0.,  1.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.]])

load weight


In [5]:
t1, t2 = nn.load_weight('ex4weights.mat')
t1.shape, t2.shape


Out[5]:
((25, 401), (10, 26))

In [6]:
theta = nn.serialize(t1, t2)  # flatten params
theta.shape


Out[6]:
(10285,)

feed forward

(400 + 1) -> (25 + 1) -> (10)


In [7]:
_, _, _, _, h = nn.feed_forward(theta, X)
h # 5000*10


Out[7]:
array([[  1.12661530e-04,   1.74127856e-03,   2.52696959e-03, ...,
          4.01468105e-04,   6.48072305e-03,   9.95734012e-01],
       [  4.79026796e-04,   2.41495958e-03,   3.44755685e-03, ...,
          2.39107046e-03,   1.97025086e-03,   9.95696931e-01],
       [  8.85702310e-05,   3.24266731e-03,   2.55419797e-02, ...,
          6.22892325e-02,   5.49803551e-03,   9.28008397e-01],
       ..., 
       [  5.17641791e-02,   3.81715020e-03,   2.96297510e-02, ...,
          2.15667361e-03,   6.49826950e-01,   2.42384687e-05],
       [  8.30631310e-04,   6.22003774e-04,   3.14518512e-04, ...,
          1.19366192e-02,   9.71410499e-01,   2.06173648e-04],
       [  4.81465717e-05,   4.58821829e-04,   2.15146201e-05, ...,
          5.73434571e-03,   6.96288990e-01,   8.18576980e-02]])

cost function

think about this, now we have $y$ and $h_{\theta} \in R^{5000 \times 10}$
If you just ignore the m and k dimention, pairwisely this computation is trivial.
the eqation $= y*log(h_{\theta}) - (1-y)*log(1-h_{\theta})$
all you need to do after pairwise computation is sums this 2d array up and divided by m


In [8]:
nn.cost(theta, X, y)


Out[8]:
0.28762916516131892

regularized cost function

the first column of t1 and t2 is intercept $\theta$, just forget them when you do regularization


In [9]:
nn.regularized_cost(theta, X, y)


Out[9]:
0.38376985909092365

In [ ]: